Chinese Chunking with Tri-training Learning
نویسندگان
چکیده
This paper presents a practical tri-training method for Chinese chunking using a small amount of labeled training data and a much larger pool of unlabeled data. We propose a novel selection method for tri-training learning in which newly labeled sentences are selected by comparing the agreements of three classifiers. In detail, in each iteration, a new sample is selected for a classifier if the other two classifiers agree on the labels while itself disagrees. We compare the proposed tri-training learning approach with co-training learning approach on Upenn Chinese Treebank V4.0(CTB4). The experimental results show that the proposed approach can improve the performance significantly.
منابع مشابه
An Empirical Study of Chinese Chunking
In this paper, we describe an empirical study of Chinese chunking on a corpus, which is extracted from UPENN Chinese Treebank-4 (CTB4). First, we compare the performance of the state-of-the-art machine learning models. Then we propose two approaches in order to improve the performance of Chinese chunking. 1) We propose an approach to resolve the special problems of Chinese chunking. This approa...
متن کاملSemi-supervised Sequence Labeling for Named Entity Extraction based on Tri-Training: Case Study on Chinese Person Name Extraction
Named entity extraction is a fundamental task for many knowledge engineering applications. Existing studies rely on annotated training data, which is quite expensive when used to obtain large data sets, limiting the effectiveness of recognition. In this research, we propose an automatic labeling procedure to prepare training data from structured resources which contain known named entities. Whi...
متن کاملExploiting Chunk-level Features to Improve Phrase Chunking
Most existing systems solved the phrase chunking task with the sequence labeling approaches, in which the chunk candidates cannot be treated as a whole during parsing process so that the chunk-level features cannot be exploited in a natural way. In this paper, we formulate phrase chunking as a joint segmentation and labeling task. We propose an efficient dynamic programming algorithm with pruni...
متن کاملA Boosted Semi-Markov Perceptron
This paper proposes a boosting algorithm that uses a semi-Markov perceptron. The training algorithm repeats the training of a semi-Markov model and the update of the weights of training samples. In the boosting, training samples that are incorrectly segmented or labeled have large weights. Such training samples are aggressively learned in the training of the semi-Markov perceptron because the w...
متن کامل中文名詞組的辨識:監督式與半監督式學習法的實驗 (Chinese NP Chunking: Experiments with Supervised, and Semisupervised Learning) [In Chinese]
This paper utilizes Yamcha, a SVM tool designed by Taku Kudo, to train an NP-chunking model for Chinese. In addition to IOB and two words surrounding the focused word, we experimented on new features and exploited unlabeled data from web pages to enhance the previous model. Our experiments with supervised learning indicate that our chosen feature sets outperform those reported in previous studi...
متن کامل